Using Group Membership Markers for Group Identification in Web Logs
نویسندگان
چکیده
We describe a system for automatically ranking documents by degree of militancy, designed as a tool both for finding militant websites and prioritizing the data found. Our ranking system employs a small hand-selected vocabulary based on group membership markers used by insiders to identify members and member properties (us) and outsiders and threats (them). We use the same vocabulary to build a classifier. Evaluating several ranking systems by their correlations with human judgments, we show that the best ranker uses the small us-them vocabulary, outperforming one system with a much larger vocabulary, and another with a small vocabulary chosen by Mutual Information. We confirm and extend recent results in sentiment analysis (Paltoglou and Thelwall 2010), showing that a featureweighting scheme taken from classical IR (TFIDF) produces the best ranking system; we also find, surprisingly, that adjusting these weights with SVM training, while producing a better classifier, produces a worse ranker. Increasing vocabulary size similarly improves classification (while worsening ranking). Finally, we experiment with adding usage models to both systems, models of how well each word’s syntactic usage pattern matches the usage pattern in a class model; this model does not benefit ranking, but increases the precision of the classifier. Our work complements and extends previous work tracking radical groups on the web (Chen 2007; Zhou et al. 2007; Burris, Smith, and Strahm 2000), which classified such sites with heterogeneous indicators, including document, vocabulary, and morphological features. The method combines elements of linguistics, machine learning, and behavioral science, and in principle can be extended to data collection aimed at any group organized for collective action.
منابع مشابه
مقایسه وبلاگ های کتابخانه ها و کتابداران ایرانی با وبلاگ های برتر کتابداری؛1385
Introduction: Web logs are the evident tools for the librarians. There are three main ways for applying web logs in librarianship fields, as follows: personal use by librarian to upgrade their personal information, as a source of information in case of libraries, and for their services. The aim of this research is to comparison between Iranian libraries and librarians, and superior librarianshi...
متن کاملUsing Group Membership Markers for Group Identification
We describe a system for automatically ranking documents by degree of militancy, designed as a tool both for finding militant websites and prioritizing the data found. We compare three ranking systems, one employing a small hand-selected vocabulary based on group membership markers used by insiders to identify members and member properties (us) and outsiders and threats (them), one with a much ...
متن کاملUser Interest Level Based Preprocessing Algorithms Using Web Usage Mining
Web logs take an important role to know about user behavior. Several pattern mining techniques were developed to understand the user behavior. A specific kind of preprocessing technique improves the quality and accuracy of the pattern mining algorithms. The existing algorithms have done the preprocessing activities for reducing the size of the log file and to identify the number of unique users...
متن کاملOptimizing Membership Functions using Learning Automata for Fuzzy Association Rule Mining
The Transactions in web data often consist of quantitative data, suggesting that fuzzy set theory can be used to represent such data. The time spent by users on each web page is one type of web data, was regarded as a trapezoidal membership function (TMF) and can be used to evaluate user browsing behavior. The quality of mining fuzzy association rules depends on membership functions and since t...
متن کاملWhen objective group membership and subjective ethnic identification don’t align: How identification shapes intergroup bias through self-enhancement and perceived threat
When objective group membership and subjective ethnic identification don’t align, which has a greater impact on how people feel towards the groups they affiliate with, and why? Deprived of many distinctiveness markers typically found in intergroup relations (e.g., physical features, obvious status differences), Taiwanese society provides a perfect natural context to explore the impact of object...
متن کامل